Goto

Collaborating Authors

 group 3



Unsupervised Machine-Learning Pipeline for Data-Driven Defect Detection and Characterisation: Application to Displacement Cascades

Del Fré, Samuel, de Backer, Andrée, Domain, Christophe, Thuinet, Ludovic, Becquart, Charlotte S.

arXiv.org Artificial Intelligence

Neutron irradiation produces, within a few picoseconds, displacement cascades that are sequences of atomic collisions generating point and extended defects which subsequently affects the long-term evolution of materials. The diversity of these defects, characterized morphologically and statistically, defines what is called the "primary damage". In this work, we present a fully unsupervised machine learning (ML) workflow that detects and classifies these defects directly from molecular dynamics data. Local environments are encoded by the Smooth Overlap of Atomic Positions (SOAP) vector, anomalous atoms are isolated with autoencoder neural networks (AE), embedded with Uniform Manifold Approximation and Projection (UMAP) and clustered using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). Applied to 80 keV displacement cascades in Ni, Fe$_7$0Ni$_{10}$Cr$_{20}$, and Zr, the AE successfully identify the small fraction of outlier atoms that participate in defect formation. HDBSCAN then partitions the UMAP latent space of AE-flagged SOAP descriptors into well defined groups representing vacancy- and interstitial-dominated regions and, within each, separates small from large aggregates, assigning 99.7 % of outliers to compact physical motifs. A signed cluster-identification score confirms this separation, and cluster size scales with net defect counts (R2 > 0.89). Statistical cross analyses between the ML outlier map and several conventional detectors (centrosymmetry, dislocation extraction, etc.) reveal strong overlap and complementary coverage, all achieved without template or threshold tuning. This ML workflow thus provides an efficient tool for the quantitative mapping of structural anomalies in materials, particularly those arising from irradiation damage in displacement cascades.


Cyber Racing Coach: A Haptic Shared Control Framework for Teaching Advanced Driving Skills

Shen, Congkai, Yu, Siyuan, Weng, Yifan, Ma, Haoran, Li, Chen, Yasuda, Hiroshi, Dallas, James, Thompson, Michael, Subosits, John, Ersal, Tulga

arXiv.org Artificial Intelligence

Abstract--This study introduces a haptic shared control framework designed to teach human drivers advanced driving skills. In this context, shared control refers to a driving mode where the human driver collaborates with an autonomous driving system to control the steering of a vehicle simultaneously. Advanced driving skills are those necessary to safely push the vehicle to its handling limits in high-performance driving such as racing and emergency obstacle avoidance. Previous research has demonstrated the performance and safety benefits of shared control schemes using both subjective and objective evaluations. However, these schemes have not been assessed for their impact on skill acquisition on complex and demanding tasks. Prior research on long-term skill acquisition either applies haptic shared control to simple tasks or employs other feedback methods like visual and auditory aids. T o bridge this gap, this study creates a cyber racing coach framework based on the haptic shared control paradigm and evaluates its performance in helping human drivers acquire high-performance driving skills. The framework introduces (1) an autonomous driving system that is capable of cooperating with humans in a highly performant driving scenario; and (2) a haptic shared control mechanism along with a fading scheme to gradually reduce the steering assistance from autonomy based on the human driver's performance during training. Two benchmarks are considered: self-learning (no assistance) and full assistance during training. Results from a human subject study indicate that the proposed framework helps human drivers develop superior racing skills compared to the benchmarks, resulting in better performance and consistency. Advanced driving skills refer to a set of competencies that go beyond basic driving abilities in terms of situational awareness, hazard perception, risk management, and vehicle handling [1]. They are crucial in high-performance driving tasks such as racing, and can also improve safety in everyday driving [1], [2]. This work has been submitted to the IEEE for possible publication.


Cross-Platform DNA Methylation Classifier for the Eight Molecular Subtypes of Group 3 & 4 Medulloblastoma

Abid, Omer, Rafiee, Gholamreza

arXiv.org Artificial Intelligence

Omer Abid, Gholamreza Rafiee * Abstract -- Medulloblastoma is a malignant pediatric brain cancer, and the discovery of molecular subgroups is enabling personalized treatment strategies. In 2019, a consensus identified eight novel subtypes within Groups 3 and 4, each displaying heterogeneous chara cteristics. Classifiers are essential for translating these findings into clinical practice by supporting clinical trials, personalized therapy development and application, and patient monitoring. This study presents a DNA methylation - based, cross - platform machine learning classifier capable of distinguishing these subtypes on both HM450 and EPIC methylation array samples . Across two independent test sets, the model achieved weighted F1 = 0.95 and balanced accuracy = 0.957, consistent across platforms. As the first cross - platform solution, it provides backward compatibility while extending applicability to a newer platform, also enhancing accessibility. It also has the potential to become the first publicly available classifier for these subtypes once deployed through a web application, as planned in the future . Th is work overall takes steps in the direction of advancing precision medicine and improving clinical outcomes for patients within the majority prevalence medulloblastoma subgroups, g roups 3 and 4. Keywords -- Medulloblastoma, Molecular Subgroup Classification, Machine Learning, AI for Health Medulloblastoma is a malignant brain cancer widely known for its prevalence in children. Through extensive treatment strategies based on surgery, chemotherapy and radiation therapy, approximately 75% of the patient are able to survive in the long term [1]. These treatments whi le crucial also come along with negative side effects, effecting patients' li ves [1] [2], especially considering the implications on the growing children. However, with advancement in genomics, molecular subgroups have been discover ed within the disease . T hese subgroups have shown to be heterogenous in clinical, biological and outcomes perspective [3] . These in fact are now considered better definition of disease behaviour than conventional techniques [3] .


Predicting person-level injury severity using crash narratives: A balanced approach with roadway classification and natural language process techniques

Majidi, Mohammad Zana, Karimi, Sajjad, Wang, Teng, Kluger, Robert, Souleyrette, Reginald

arXiv.org Artificial Intelligence

Predicting injuries and fatalities in traffic crashes plays a critical role in enhancing road safety, improving emergency response, and guiding public health interventions. This study investigates the added value of unstructured crash narratives (written by police officers at the scene) when combined with structured crash data to predict injury severity. Two widely used Natural Language Processing (NLP) techniques, Term Frequency-Inverse Document Frequency (TF-IDF) and Word2Vec, were employed to extract semantic meaning from the narratives, and their effectiveness was compared. To address the challenge of class imbalance, a K-Nearest Neighbors-based oversampling method was applied to the training data prior to modeling. The dataset consists of crash records from Kentucky spanning 2019 to 2023. To account for roadway heterogeneity, three road classification schemes were used: (1) eight detailed functional classes (e.g., Urban Two-Lane, Rural Interstate, Urban Multilane Divided), (2) four broader paired categories (e.g., Urban vs. Rural, Freeway vs. Non-Freeway), and (3) a unified dataset without classification. A total of 102 machine learning models were developed by combining structured features and narrative-based features using the two NLP techniques alongside three ensemble algorithms: XGBoost, Random Forest, and AdaBoost. Results demonstrate that models incorporating narrative data consistently outperform those relying solely on structured data. Among all combinations, TF-IDF coupled with XGBoost yielded the most accurate predictions in most subgroups. The findings highlight the power of integrating textual and structured crash information to enhance person-level injury prediction. This work offers a practical and adaptable framework for transportation safety professionals to improve crash severity modeling, guide policy decisions, and design more effective countermeasures.



Facility Location with Public Locations and Private Doubly-Peaked Costs

Cole, Richard, Jangir, Pranav

arXiv.org Artificial Intelligence

In the facility location problem, the task is to place one or more facilities so as to minimize the sum of the agent costs for accessing their nearest facility. Heretofore, in the strategic version, agent locations have been assumed to be private, while their cost measures have been public and identical. For the most part, the cost measure has been the distance to the nearest facility. However, in multiple natural settings, such as placing a firehouse or a school, this modeling does not appear to be a good fit. For it seems natural that the agent locations would be known, but their costs might be private information. In addition, for these types of settings, agents may well want the nearest facility to be at the right distance: near, but not too near. This is captured by the doubly-peaked cost introduced by Filos-Ratsikas et al. (AAMAS 2017). In this paper, we re-examine the facility location problem from this perspective: known agent locations and private preferred distances to the nearest facility. We then give lower and upper bounds on achievable approximations, focusing on the problem in 1D, and in 2D with an $L_1$ distance measure.


Using customized GPT to develop prompting proficiency in architectural AI-generated images

Rodriguez, Juan David Salazar, Joyce, Sam Conrad, Julfendi, null

arXiv.org Artificial Intelligence

This research investigates the use of customized GPT models to enhance prompting proficiency among architecture students when generating AI-driven images. Prompt engineering is increasingly essential in architectural education due to the widespread adoption of generative AI tools. This study utilized a mixed-methods experimental design involving architecture students divided into three distinct groups: a control group receiving no structured support, a second group provided with structured prompting guides, and a third group supported by both structured guides and interactive AI personas. Students engaged in reverse engineering tasks, first guessing provided image prompts and then generating their own prompts, aiming to boost critical thinking and prompting skills. Variables examined included time spent prompting, word count, prompt similarity, and concreteness. Quantitative analysis involved correlation assessments between these variables and a one-way ANOVA to evaluate differences across groups. While several correlations showed meaningful relationships, not all were statistically significant. ANOVA results indicated statistically significant improvements in word count, similarity, and concreteness, especially in the group supported by AI personas and structured prompting guides. Qualitative feedback complemented these findings, revealing enhanced confidence and critical thinking skills in students. These results suggest tailored GPT interactions substantially improve students' ability to communicate architectural concepts clearly and effectively.


Concept Space Alignment in Multilingual LLMs

Peng, Qiwei, Søgaard, Anders

arXiv.org Artificial Intelligence

Multilingual large language models (LLMs) seem to generalize somewhat across languages. We hypothesize this is a result of implicit vector space alignment. Evaluating such alignment, we see that larger models exhibit very high-quality linear alignments between corresponding concepts in different languages. Our experiments show that multilingual LLMs suffer from two familiar weaknesses: generalization works best for languages with similar typology, and for abstract concepts. For some models, e.g., the Llama-2 family of models, prompt-based embeddings align better than word embeddings, but the projections are less linear -- an observation that holds across almost all model families, indicating that some of the implicitly learned alignments are broken somewhat by prompt-based methods.


DMTG: One-Shot Differentiable Multi-Task Grouping

Gao, Yuan, Jiang, Shuguo, Li, Moran, Yu, Jin-Gang, Xia, Gui-Song

arXiv.org Machine Learning

We aim to address Multi-Task Learning (MTL) with a large number of tasks by Multi-Task Grouping (MTG). Given N tasks, we propose to simultaneously identify the best task groups from 2^N candidates and train the model weights simultaneously in one-shot, with the high-order task-affinity fully exploited. This is distinct from the pioneering methods which sequentially identify the groups and train the model weights, where the group identification often relies on heuristics. As a result, our method not only improves the training efficiency, but also mitigates the objective bias introduced by the sequential procedures that potentially lead to a suboptimal solution. Specifically, we formulate MTG as a fully differentiable pruning problem on an adaptive network architecture determined by an underlying Categorical distribution. To categorize N tasks into K groups (represented by K encoder branches), we initially set up KN task heads, where each branch connects to all N task heads to exploit the high-order task-affinity. Then, we gradually prune the KN heads down to N by learning a relaxed differentiable Categorical distribution, ensuring that each task is exclusively and uniquely categorized into only one branch. Extensive experiments on CelebA and Taskonomy datasets with detailed ablations show the promising performance and efficiency of our method. The codes are available at https://github.com/ethanygao/DMTG.